-
Notifications
You must be signed in to change notification settings - Fork 623
add qwen3 moe #631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
add qwen3 moe #631
Conversation
I have read the CLA Document and I hereby sign the CLA 1 out of 2 committers have signed the CLA. |
I have read the CLA Document and I hereby sign the CLA |
recheck |
|
||
# Expert weights 추가 | ||
for expert_idx in range(num_experts): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parameter order in the expert loops is inconsistent between implementations. In Qwen3MoeModuleArchitecture
, the order is up_proj
, gate_proj
, down_proj
, while in KORMoMoeModuleArchitecture
it's gate_proj
, up_proj
, down_proj
. This inconsistency should be standardized to ensure correct weight processing across different model architectures. Consider aligning the parameter order in both implementations to maintain consistency throughout the codebase.
Spotted by Graphite Agent
Is this helpful? React 👍 or 👎 to let us know.
tqdm.tqdm(router_weights, desc="Router weights") | ||
): | ||
writer.save_tensor( | ||
f"model.layers.{layer_idx}.mlp.gate.linear.weight", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Router Gate Weight Naming Mismatch
The KORMoMoeModuleArchitecture
expects router gate weights to be named mlp.gate.weight
, but the KORMoMoE
saving logic adds a .linear
suffix, resulting in mlp.gate.linear.weight
. This naming inconsistency prevents the model from loading correctly.
Additional Locations (1)
tqdm.tqdm(router_weights, desc="Router weights") | ||
): | ||
writer.save_tensor( | ||
f"model.layers.{layer_idx}.mlp.gate.linear.weight", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tensor path in kormo.py
should be model.layers.{layer_idx}.mlp.gate.weight
rather than model.layers.{layer_idx}.mlp.gate.linear.weight
to maintain consistency with the MoEGate
implementation in the modeling file. The current path would cause the router weights to be saved at an incorrect location, making them inaccessible to the model during loading.
f"model.layers.{layer_idx}.mlp.gate.linear.weight", | |
f"model.layers.{layer_idx}.mlp.gate.weight", |
Spotted by Graphite Agent
Is this helpful? React 👍 or 👎 to let us know.
Add support for Qwen3 MoE conversion.
Modified files: